Overview

We propose to know the impact of COVID-19 tackling infodemics and misinformation on Twitter. This is done by extracting recent popular tweets from a specific location across different countries. It will help us describe the false information that is spread with the sole purpose of causing confusion and harm. We target to extract hashtags like #covid19, #misinformation, #fakenews, #disinformation, #, etc., to get the related posts about it and analyze how the information processing and decision-making behaviors are compromised. We perform sentimental analysis on the tweets to understand the sentiments of people which is crucial during the time of this pandemic

1 Twitter Data

We have primarily two datasets - one of them contains tweets from the onset of the pandemic and the other are very recent tweets (June 2021). Our main objective here is to figure out how the sentiments have changed over the months.

For the security purposes, we show the skeletal code to extract the tweets using fake credentials. We would load the data via .rds file for our extracted tweets. (Rul, n.d.)

library(rtweet)
library(dplyr)
library(tidyr)
library(twitteR)
library(tidytext)

appname <- "CovidDistress"
key <- "ogRXvxribQAEt9tJKQ1rEd0c0"
secret <- "HlvVRoFg73JJcpcGjYxUWBagWratEIrdagPCeaiToWTKa15vCO"
access_token <- "15914217-8YYyRRAxRBL0Vu9Y0tAjVFfPvdJdYByfmsiVpLEoD"
access_secret <- "oeXIkYHBTQpGRxZCKI4q67UN3L8PuJfwb0su6EOkIk22f" 

twitter_token <- create_token(
  app = appname,
  consumer_key = key,
  consumer_secret = secret,
  access_token = access_token,
  access_secret = access_secret,
  set_renv = TRUE)

corona_tweets <- search_tweets(q = "#covid19 OR #coronavirus", n=20000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)

saveRDS(corona_tweets, "../data/tweets2021.rds")

We can now load saved RDS file using the command below

tweets2021_raw <- readRDS("../data/tweets2021.rds")

There are 35725 tweets from the dataset which is more than what we intended. This is because we set retryonratelimit to TRUE. These tweets are dated from June 17 2021 to June 19, 2021

Here’s a sample row from the dataset

user_id status_id created_at screen_name text source display_text_width reply_to_status_id reply_to_user_id reply_to_screen_name is_quote is_retweet favorite_count retweet_count quote_count reply_count hashtags symbols urls_url urls_t.co urls_expanded_url media_url media_t.co media_expanded_url media_type ext_media_url ext_media_t.co ext_media_expanded_url ext_media_type mentions_user_id mentions_screen_name lang quoted_status_id quoted_text quoted_created_at quoted_source quoted_favorite_count quoted_retweet_count quoted_user_id quoted_screen_name quoted_name quoted_followers_count quoted_friends_count quoted_statuses_count quoted_location quoted_description quoted_verified retweet_status_id retweet_text retweet_created_at retweet_source retweet_favorite_count retweet_retweet_count retweet_user_id retweet_screen_name retweet_name retweet_followers_count retweet_friends_count retweet_statuses_count retweet_location retweet_description retweet_verified place_url place_name place_full_name place_type country country_code geo_coords coords_coords bbox_coords status_url name location description url protected followers_count friends_count listed_count statuses_count favourites_count account_created_at verified profile_url profile_expanded_url account_lang profile_banner_url profile_background_url profile_image_url
x4233818847 x1406271649062830080 2021-06-19 15:24:23 Vickeysclick No #VaccinationDrive at #Namakkal on #Sunday. @namakkal09 @Namakkalpolice #COVID19 Twitter for Android 83 FALSE FALSE 0 0 NA NA VaccinationDrive Namakkal Sunday COVID19 NA x1246397788293742593 x1113056726931009536 namakkal09 Namakkalpolice en NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA https://twitter.com/Vickeysclick/status/1406271649062830080 Vignesh Vijayakumar Daydreamer, Journalist, fact-checker... Tweets are personal& RT's aren't endorsements FALSE 183 538 0 702 10241 2015-11-20 10:46:08 FALSE NA http://abs.twimg.com/images/themes/theme1/bg.png http://pbs.twimg.com/profile_images/667662165390835712/ZmO8TbTB_normal.jpg

We also have few other datasets that has tweets from 2020 and with other hashtags

tweets2021_vaccine<- search_tweets(q = "#vaccine", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_vaccine_and_covid19<- search_tweets(q = "#covid19 AND #vaccine", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_job <- search_tweets(q = "#job", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_job_covid19 <- search_tweets(q = "#covid19 AND #job", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_jobloss <- search_tweets(q = "#covid19 AND #jobloss", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_donate <- search_tweets(q = "#covid19 AND #donate", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)

To be added 1. English Word Cloud (Old Vs New) 2. Frequency Chart (Old Vs New) 3. Postive and Negative Common Words (Old Vs New) 4. Sentiment Analysis Bar graph (Old Vs New) 5. A world map for seeing the tweets world wide (Old Vs New If Possible) 6. German Word Cloud 7. Sentiment Regarding Vaccine Word Cloud/Bar Graph 8. Prefferred Vaccine Word Cloud/Bar Graph 9. Mental Health Word Cloud/Bar Graph

2 Data Cleaning and Preparation

To explore the data and extract insights in the most efficient way, we decided to clean up the data. We use only the relevant columns

##  [1] "user_id"                 "status_id"              
##  [3] "created_at"              "screen_name"            
##  [5] "text"                    "source"                 
##  [7] "display_text_width"      "reply_to_status_id"     
##  [9] "reply_to_user_id"        "reply_to_screen_name"   
## [11] "is_quote"                "is_retweet"             
## [13] "favorite_count"          "retweet_count"          
## [15] "quote_count"             "reply_count"            
## [17] "hashtags"                "symbols"                
## [19] "urls_url"                "urls_t.co"              
## [21] "urls_expanded_url"       "media_url"              
## [23] "media_t.co"              "media_expanded_url"     
## [25] "media_type"              "ext_media_url"          
## [27] "ext_media_t.co"          "ext_media_expanded_url" 
## [29] "ext_media_type"          "mentions_user_id"       
## [31] "mentions_screen_name"    "lang"                   
## [33] "quoted_status_id"        "quoted_text"            
## [35] "quoted_created_at"       "quoted_source"          
## [37] "quoted_favorite_count"   "quoted_retweet_count"   
## [39] "quoted_user_id"          "quoted_screen_name"     
## [41] "quoted_name"             "quoted_followers_count" 
## [43] "quoted_friends_count"    "quoted_statuses_count"  
## [45] "quoted_location"         "quoted_description"     
## [47] "quoted_verified"         "retweet_status_id"      
## [49] "retweet_text"            "retweet_created_at"     
## [51] "retweet_source"          "retweet_favorite_count" 
## [53] "retweet_retweet_count"   "retweet_user_id"        
## [55] "retweet_screen_name"     "retweet_name"           
## [57] "retweet_followers_count" "retweet_friends_count"  
## [59] "retweet_statuses_count"  "retweet_location"       
## [61] "retweet_description"     "retweet_verified"       
## [63] "place_url"               "place_name"             
## [65] "place_full_name"         "place_type"             
## [67] "country"                 "country_code"           
## [69] "geo_coords"              "coords_coords"          
## [71] "bbox_coords"             "status_url"             
## [73] "name"                    "location"               
## [75] "description"             "url"                    
## [77] "protected"               "followers_count"        
## [79] "friends_count"           "listed_count"           
## [81] "statuses_count"          "favourites_count"       
## [83] "account_created_at"      "verified"               
## [85] "profile_url"             "profile_expanded_url"   
## [87] "account_lang"            "profile_banner_url"     
## [89] "profile_background_url"  "profile_image_url"

For more powerful insights, we use only the columns “text,” “hashtags” and “location” and we speciafically clean up the columns text and hashtags. Let’s do some basic analysis to see the top locations of tweets.

tweets2021_raw %>% 
  filter(!is.na(location) & location != "") %>% 
  count(location, sort = TRUE) %>% 
  top_n(10)

It is however important to note that Twitter API is based on relevance and not completedness https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/overview

#install.packages("devtools")
#devtools::install_github("hadley/emo")
library(emo)
tweets2021_raw %>%
  mutate(emoji = ji_extract_all(text)) %>%
  unnest(cols = c(emoji)) %>%
  count(emoji, sort = TRUE) %>%
  top_n(10)

References

Rul, Céline Van den. n.d. “A Guide to Mining and Analysing Tweets with r.” https://towardsdatascience.com/a-guide-to-mining-and-analysing-tweets-with-r-2f56818fdd16.